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SYSTEM AND METHOD FOR MANAGING CONTENT 

CROSS-REFERENCE TO RELA TED APPLICATIONS 

[01] This application claims priority to U.S. Provisional Application Serial Number 
60/434,418 entitled "FILE MANAGEMENT SYSTEM AND METHOD" which was. 
filed on December 19, 2002, and which is incorporated herein by reference in its entirety. 
This application is also related to corresponding U.S. Patent Application entitled "System 
and Method for Managing Content Including Content Addressability Features," Attorney 
Docket Number 25396-004; U.S. Patent Application entitled "System and Method for 
Managing Versions," Attorney Docket Number 25396-005; U.S. Patent Application 
entitled "System and Method for Managing Content With Event Driven Actions to » 
Facilitate Workflow and Other Features" Attorney Docket Number 25396-006; and U.S. 
Patent Application entitled "Graphical User Interface for System and Method for 
Managing Content," Attorney Docket Number 25396-007, filed simultaneously herewith, 
each of which is incorporated herein by reference in its entirety. 

FIELD OF THE INVENTION 

[02] The present invention relates to an integrated system and method for managing 
files, messages and other digital content that facilitates categorization of information, 
provides version control, allows event-driven actions including control of workflow, 
permits sharing and access control of files, is transactionally-based to permit easy 
historical viewing and undoing of a wide variety of changes to files and folders and other 
features, and a graphical user interface to facilitate access to and use of such a system. 
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BACKGROUND OF THE INVENTION 

[03] Computers have revolutionized the storage, retrieval and use of information. As 
the costs and size of computer memory has gone down, the amount of information 
accessible to a user has increased substantially. The expansion of networks, including 
global networks, such as the Internet, has also greatly contributed to this growth. This 
growth has greatly outpaced the ability of existing systems to find, share and organize 
that information. 

[04] Originally, electronic file systems were based upon simple filing concepts from 
paper files. Files were organized into folders and subfolders, just like documents in filing 
cabinets. As the number and types of files have grown, the inadequacies of the early 
systems have become increasingly apparent. In the physical environment, as the number 
of filing cabinets increased, indexing systems were developed to locate specific files or 
documents. Such systems are still used in controlling physical documents. In the 
electronic realm, similar file management systems have also developed. However, 
networks have changed the nature of file storage. A user is no longer limited to the files 
on a single computer. Instead, a single user can create, store, access, modify and copy 
files on any number of machines, including their own computer, network servers, and 
even co-workers computers. Additionally, others on a network may be creating, copying, 
and modifying those same files. The exploding use of email has also contributed to 
current problems. Emails are also retained and they need to be organized and controlled, 
so that they can be later located, accessed and used. Within existing computer filing 
systems, disorganization is rampant, and it can be hard to find things. In recent years, 
various disparate applications have emerged to solve some aspects of the problems: 
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Version Control systems, Document Management systems, Workflow systems, 
Configuration Management systems, Archiving systems, Backup systems, general 
purpose databases, etc. These applications are yet other places to store files, in systems 
that have to be learned, maintained, backed up, etc. 

[05] One of the many problems with existing electronic filing systems is the creation 
of copies. It is very easy to copy a file. There are also important reasons why a copy of a 
file may be better than the original, in terms of accessibility and convenience. However, 
the creation of many copies further increases the disorganization of filing systems. 
Studies have shown that most of the files on people's computers and disks are copies of 
files from other computers on the network, from read-only media, and from their own 
computer. 

[06] The creation of copies can be very confusing. The original file may be changed, 
or the copy may be changed. Then, they are no longer exact copies, but a user can easily 
lose track of which is the correct one. Many times the creator of a copy forgets about it 
or why it was created. The copy then continues to exist, using valuable storage and name 
space, but without any purpose. The vast majority of copies are not necessary. 
Therefore, a need exists for a file management system with improved performance such 
that the need for copies is limited. Furthermore, a need exists for a file management 
system that maintains information about copies of files so that its use and relationship to 
other files can be easily determined, 

[07] Another problem with current file systems is that different users may use different 
approaches to file organization. This leads to difficulties in finding and sharing files. 
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Another problem is the way that access control and sharing are managed. The sharing 
and access control features in the Windows™ operating system, for example, are very 
difficult for the average user to make sense of, to use and to maintain. An advanced user 
is typically needed to establish and maintain file sharing groups and related mechanisms. 
Improper sharing and access control may allow access to information that should not be 
disclosed, or files may be inaccessible that should be shared. Therefore, a need exists for 
a file management system that allows simple control of access control and file sharing. 

[08] Locating a desired file is another complicated process in existing systems, Each 
computer or disk drive is often searched separately, even though information may be 
stored on several different, interconnected, computers. Even if a search looks for a file 
on multiple computers, the search results can be misleading or incomplete. The problems 
with copies may mean that a search may produce many duplicate results and results that 
do not include the best version. The system provides little, if any, assistance in 
determining which is the proper (e.g. current) file. Therefore, a need exists for a file 
management system that allows searching on multiple computers and organizes results in 
a useful manner. 

[09] It is well known that it is advisable to maintain backup copies of files in case of 
corruption, loss, or other problems. However, there are numerous problems with backup 
systems. Often, backup systems are not installed or operated on a regular basis. 
Sometimes, backups do not succeed when scheduled. Very often, only essential servers 
are backed up; the files on individual computers typically are not regularly backed up. 
Additionally, locating and retrieving a backup file can be difficult. Therefore, a need 
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exists for a file management system that simplifies the backup and restoration processes. 
Other drawbacks exist. 

SUMMARY OF THE INVENTION 

[010] An object to the invention is to overcome these and other drawbacks. The present 
invention substantially overcomes the deficiencies of the prior art through a novel file 
management system. According to one aspect of the invention, the file management 
system includes an object oriented file management database. The file management 
system includes a volume manager and a coherency manager. The volume manager 
manages a set of volumes. Each volume may include folders, files and other digital 
content, and it may reference other volumes. The coherency manager, among other 
things, facilitates consistency among multiple volume managers. According to another 
aspect of the invention, a novel user interface for interacting with the file management 
system is provided. 

[Oil] Unlike conventional file management systems, the file management system of the 
present invention is content addressable and self-organizing to facilitate categorization of 
information, includes a publish/subscribe capability and event-driven actions to facilitate 
sharing and access control of files and workflow, is transactionally-based to facilitate the 
ability to enable a historical view showing actions performed on that file or folder and 
restoring files and folder to states prior to a change. As detailed below, these and other 
aspects of the invention enable a number of advantageous/features. 

[012] According to one embodiment, implementation of the content addressability 
feature includes the use of tags. Tags are name-value pairs that describe folder or file 
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attributes. Tags can have a single value or, in some cases, multiple values. According to 
one aspect of the invention, some tags may be system generated tags and others may be 
user selected tags. Via the user interface, for example, by right clicking on a file or folder 
and selecting tags from a menu, a user can open a Window showing the item's tag 
information and can view and/or change tag information. 

[013] According to another aspect of the invention, each volume can include one or 
more folders. A folder may be configured to be a view of the database and include 
pointers to the files associated with that view. This enables the contents of a folder to be 
constructed and maintained dynamically. According to another aspect of the invention, 
various folder types may be used. By way of example, the folder types may include one 
or more of a query folder, a search folder, a merge folder, a magnetic folder, a typed 
. folder and other types of folders. 

[014] A query folder is a folder that generates a query (e.g., based on the folder name or 
based on a tag attached to the folder, or otherwise) into the file management database. A 
query folder encapsulates a set of search criteria and includes real-time-updated results of 
the search. If a file is later changed so that it matches the query, it will be added to the 
corresponding query folder. Similarly, if a file is later changed so that it no longer 
matches the query, it will be removed. The search can be a full-text search across one or 
more volumes, or it can be a tag search, where the query searches tags that have certain 
values. Other search techniques may also be used. Matching objects are then associated 
with that query folder. 
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[015] A search folder is a folder that has associated with it search criteria for searching 
contents of files or other digital objects. Matching objects are then associated with that 
search folder. According to one aspect of the invention the volume manager supports 
integration with free-text search software. When any application changes the contents of 
a file (or folder), the normal sequence is for the file to be opened, written to, and then 
closed. The volume manager processes each of these requests. When it determines that a 
file has changed, a sequence of actions is processed. One of these actions can include 
queuing the file to a search engine for indexing. In a similar way, immediately after a file 
is erased, a request to remove the file from the index is queued to the search engine. 

[01 6] According to one embodiment, the system recognizes folders with specially 
formed names, or with special tags, as being search folders or query folders. When such 
a folder is recognized, a search string is extracted from the folder name or from specific 
tags, and passed to a search engine. The results of the search are shown as familiar files- 
in-folders. If the search query is presented in the form of a folder name or a tag value, it 
is persistent. The search strings can include complex search expressions, including 
boolean operations. When a file is created or is changed so that it matches an active 
search folder, the name of the file will appear in that folder without any additional 
intervention by the user. Files can also be specially marked to prevent indexing. Other 
aspects of searching are facilitated by the invention. 

[01 7] A merge folder is a folder (or overlay) that combines two or more folders (e.g., 
using boolean logic or otherwise). A merge folder can include items from a 'merge list' 
of other folders. An item in a folder in the merge list hides a like-named item in a folder 
farther down in the merge list. According to one embodiment, the merge is real-time, not 
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a snapshot. As items appear and disappear in the merged folders, they appear and 
disappear in the merge folder contents. A merge folder can be configured to allow 
creation of new items in the first folder in the merge list, and it can be configured to 
allow the system to delete items from where they reside or merely to hide them from 
appearing in the merge folder. Items from the source folders can appear in the merge 
folder as sync links. Preferably, the system uses a combination of query folders and 
merge folders to implement one form of complex queries. 

[018] A magnetic folder "attracts" files with certain tag values. For example, magnetic / 
folders disable automatic removal if a file ever matches a query or other criteria. 

[019] Typed folders are folders that include files or other content that have certain 
characteristics. For example, a typed folder can limit what types of files can be located in 
the folder (e.g., only PDF files), it can prevent certain types of files from being located in 
the folder and can require certain content. For example, a 'Group Role' folder can be 
allowed to include only 'User' files and 'Group Access' folders. 

[020] According to another aspect of the invention, changes to folders and files are 
handled on a transactional basis. This enables the system to retain information regarding 
the creation, modification, and uses of a file or its attributes, maintains information 
regarding relationships between files, controls access to files based upon the stored 
information and provides other advantages. This aspect of the invention facilitates an 
item history feature. Each time an item is copied, moved, deleted, saved, renamed, etc., 
the volume manager keeps a record of one or more of what was done, by whom, when, 
why and other desired information. This information may be seen by choosing an item 
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(e.g., by right-clicking the item from the user interface) and selecting "Show History." In 
some embodiments, this brings up a window that shows one or more of where this item 
was copied from and to, who did it, when, why and other desired information. The Item 
History for a folder can also include a list of items that used to be in the folder but which 
were either deleted or moved from the folder. The user can open and explore these items 
if desired (they will be frozen as discussed below). These items can be selected by 
selecting 'Undelete' or 'Bring back' from a menu., 

[021] An 'undo' option lets a user undo other previous commands. When a user right 
clicks on a file or folder and selects the 'Undo.. . ' menu item, this brings up a dialog box 
that describes a list of things done to the item and the option to undo one or more of 
them. The undo feature applies to whole folder hierarchies as well as to individual or 
collections of files. Other changes to files and folders can be viewed and undone in 
accordance with the present invention. 

[022] The system further permits a user to select a 'Show versions' menu item. This 
displays all extant past versions, which are all frozen. The user can drag these versions to 
somewhere, open them, compare them with other versions, or perform other file 
operations. They are just files and folders (except they're frozen). To make a previous 
version become the latest, most current version again, the user can right click on an old 
version and select the 'Make Current' command. The item will then be reinstated as the 
current version. 
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[023 ] These features facilitate simple tasks like undeleting a file but also provide a 
broader range of novel features including the ability to undo a renaming of a file or folder 
and other changes made to the file or folder. 

[024] Another feature accessible from the user interface is the ability to freeze files or 
folders. When a file is frozen, both the contents of the file and the tags attached to it are 
made permanently read-only. A file or a folder and all of its contents (recursively) can be 
frozen. When this occurs, no one, not even a super-user or administrator can make it 
modifiable. Yet it can still be read. When an item is frozen, the user can be assured that 
the item is truly a snapshot taken when it says it was taken and that everything in it is as it 
was, nothing added, nothing changed, nothing removed. . . 

[025] According to one embodiment, every file has an inspectable cryptographically- 
strong hash code (using the SHA-1 algorithm, for example). The user interface permits 
verification so that this hash code can be used to verify that the content really is intact, 
and that no error or hacking has changed the content. The hash code may also be used 
for digital signatures. 

[026] Another aspect of the invention relates to versioning and saving. The system 
permits saving a file from an unmodified application, or a user can choose the 'Save as 
Version' menu item. The 'Save as Version' command takes a snapshot of an item by 
making a copy of it, freezing the copy so it will never change, and associating it with 
other past versions of the item. A user can access any past version and copy it, link to it, 
or move it, but it can't be modified, since it will be frozen. When a snapshot is 
performed, the volume manager also records who, when, and optionally, why (if a user 
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chooses to supply a comment or have the system do so automatically). Taking a snapshot 
of a folder is similar except that the volume manager saves a frozen copy of everything 
under the folder. 

[027] Another aspect of the invention relates to event driven actions including triggers 
and constraints. Anything done to a file or a folder can be an event that can trigger an 
action. A constraint can be a required event or condition that must occur or exist before a 
certain action can occur. For example, it can prevent a file from being published before 
certain approvals are obtained. Numerous other uses exist for triggers and constraints. 
To use this feature, a user can select from many pre-programmed actions and customizes 
them with drag and drop and form- fill-in. In some embodiments, actions can be 
programmed by the user. The combined result of all programmed actions enables the 
system to react in real time. As an example, the system uses event-driven actions to 
notify the right people when a work product file is ready for them to review or to use in 
some other part of a project. Using event-driven actions, a user can build complex 
workflow automation into folders and files. 

[028] Another feature of the user interface is the ability to easily manipulate lists. 
According to this aspect of the invention, in list view, a user can sort by column as usual, 
but in addition, can configure any column to show the contents in 'my order'. When the 
folder display is in this mode, a user can rearrange the order of folder items using drag 
and drop techniques. The folder subsequently remembers the user's ordering. 

[029] Various aspects of the volume manager and coherency manager facilitate various 
other aspects of the invention. One such aspect of the invention relates to smart copies. 
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The volume manager eliminates many scenarios that would have necessitated making 
copies. The primary scenario where a true copy is useful is where a user wants to modify 
one copy in one way and another copy in another way. For these and other reasons, the 
smart copy feature of the volume manager encompasses several enhancements over , 
traditional file copies. According to one embodiment of this aspect of the invention the 
system permits live copies, deferred copies and other provides other copy-related 
benefits. 

[030] According this aspect of the invention, when the system makes a live copy of a 
file named A to a file named B it makes both A and B refer to the same underlying file. 
If a user modifies file A, file B reflects the change immediately. Deleting file A or B has 
no effect on the other file. If a new version of one file is made, then the other filename 
will refer to that new version. The coherency manager permits live copies to be on 
different volumes. Live copies can refer to folders as well as files. 

[031] The live copy feature facilitates organization of data, in part, because it lets a user 
put the same file or folder inside more than one folder. For example, a photo can be in 
both the Yosemite folder and the Jane folder. In reality, the folders each include a 
reference to the same physical file. So if the photo is changed, the change will be 
reflected in the "copy" in each folder. 

[032] Another aspect of the invention relates to deferred copies. When the system 
makes a "regular" copy of an original file named A to a copy named B, the volume 
manager knows that the names refer to copies of the same file. This uses only a small 
amount of additional disk space. Initially both the original item and the "copy" share the 
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same data. However, at the time that a user modifies either the file called A or the one 
called B, the volume manager will make a copy of the single underlying file, and each of 
the two names will refer to its own separate data. This applies to files, folders and other 
items. In the case of folders, only when files are modified in one or the other copy does 
the volume manager actually need to allocate space for the new, modified copy. 

[033] After copying file A to a new file B, very little additional disk space is needed 
because of the deferred copy feature. File A will remember that it was copied to file B, 
and file B will remember that it was copied from file A. This information can be seen in 
the user interface and it can be used to navigate from one copy to another. File A and file 
B share the same list of previous versions. If we modify A and then also modify B, the 
current versions will differ, but both still share all of the same previous versions. 
Normally, when a file is copied, the copy is associated with the same current version and 
all the same previous versions. But if desired, a user can copy a past version of A to a 
new file C, and then modify C. Now A and C differ, but the ancestry they share is the 
same up to the point where the copy was made. 

[034] Another aspect of the invention relates to smart links. Windows has shortcut ; 
files. Mac OS has alias files. Unix has symbolic links and hard links. The invention 
supports these features and more. A link is a reference to whatever is at the end of the 
given path. The path can be relative, absolute, or it can be a URL. With adequate 
permissions, a user can make the link "sticky." A sticky link gets to dictate attributes of 
what it points to: the file type (such as a PDF file), whether there has to always be 
something there at the end of the path, and whether the link will adjust to point to the new 
location if the reference moves. A link can be configured to behave like a Mac OS alias, 
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Windows shortcut, or Unix symbolic link or hard link, appropriate to the platform from 
which it is accessed. A link can also be configured to keep a cached copy of whatever 
was there the last time the link was used. The link might include a cached copy of a 
remote web page or a folder on a remote web site, for example. 

[035] Another aspect of the invention relates to a smart caching feature. When a user 
accesses volume A on server X from client machine Y, the volume manager on machine 
Y creates an entry for volume A in its local disk cache. From then on, even if the user 
disconnects from server X, he can still work on volume A from their client machine Y, 
using whatever is cached locally. Preferably, the user can request that certain files from 
volume A will always be cached on their client machine, in case they disconnect or in 
case the server goes down. To do this, the user can select an item on volume A, right 
click, and then select the -Keep local' menu item from a pop-up menu. If the user sets 
'Keep local 9 on a folder, all of that folder's contents, recursively, are affected. If the user 
also wants to protect against the item being deleted, the system can make a Live Copy. 

[036] The volume manager on client machine Y works unobtrusively in the background 
to ensure that 'keep local' items remain in sync with the server. If the user disconnects Y 
from the network then reconnects, the volume manager will synchronize the cache with 
the server. If the user made any changes in the local cache while disconnected, there may 
be conflicts with changes on the server. In this case, the user interface will help the user 
reconcile differences. The user interface's compare-merge tools facilitate this. 

[03 7] Another aspect of the invention relates to a sm art back up feature. The volume 
manager handles backups in an automated way. As files are changed, they are sent over 
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the network to another machine running a copy of the volume manager, which has been 
designated as the 'backup server'. The versioning features make a volume an ideal store 
for backups because it has adequate expressive power to accurately represent the history 
of the backed-up data. Also, the system' s transactional characteristics are ideal for 
backup because the backup can be guaranteed to be a consistent snapshot. 

[038] Backups happen continuously, slowing down only when there's nothing to do or 
to get out of the way while a user is using his computer. Whenever there is idle time, at 
night, at lunch, while a user is on the phone, backups can go at full speed. 

[039] To arrange for backup of a folder, the user right-clicks on the folder and selects 
the "Backup. menu item. The user then designates a folder on another volume where 
he wants there to be a redundant copy of this folder and its versions from now on. 
Features in the user interface will assist the user in locating a volume manager on their 
network that is an appropriate receptacle for their backups. Such a machine would often 
be (but does not have to be) a dedicated, unattended server (called a 'backup drone'), 
shared by multiple users. The user interface will also help the user identify an 
appropriate place to store their files on the backup machine. For example, there could be 
a specific part of the backup machine's folder hierarchy that has been designated for 
backups. Typically, the folder being backed up will be the root folder of a volume. The 
backup drone will generally be up and connected 24x7. It may have RAID disks, it may 
be a member of a Cluster, and it may in turn back up to another drone off-site. 

[040] Backups are useful for at least two classes of problems: disaster recovery and 
undo. Disaster recovery is easily handled by copying an entire folder or volume from 
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backup as of the most recent backup. Undo allows a user to retrieve deleted items and 
past versions of modified items. As discussed earlier, undo of recent deletions and 
modifications doesn't require backup, since the volume manager keeps recent versions on 
the local disk. Eventually, however, enough old versions may accumulate on the local 
disk that the volume manager will need to delete some of them, counting on a backup 
volume to supply the data if it's needed. If an undo involves data that has been deleted 
from the local volume, the user interface transparently retrieves the needed data from the 
backup volume. The undo operation is a little slower, but otherwise operates similarly. 

[041] As can be seen, these various features, functioning together, permit great synergy 
and provide unique functionality not heretofore believed to be known. By way of 
example, the freezing feature is particularly beneficial to reliably storing past versions. 
The deferred copies feature makes the folder snapshot feature practical because it 
requires minimal disk space. Another useful versioning feature is the ability to view a 
folder hierarchy or an entire volume as of a given time. This 'as of view uses frozen 
items. Various other synergies exist 

BRIEF DESCRIPTION OF THE DRAWINGS 

[042] Fig. 1 illustrates complexity in access control associated with a conventional 
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system. 

[043] Fig. 2 illustrates a server system that can utilize a file management system 
according to an embodiment of the present invention. 

[044] Fig. 3 illustrates various components of a file management system according to an 
embodiment of the present invention. 
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[045] Fig. 4 illustrates communications in a file management system according to an 
embodiment of the present invention. 

[046] Fig. 5 illustrates a block diagram of a file management system according to an 
embodiment of the present invention. 

DETAILED DESCRIPTION 

[047] Fig. 2 illustrates a computer system 1 00 to which the file management system of 
the present invention can be applied. As illustrated in Fig ? 2, the computer system 100 
includes a server 110 and a terminal device 120. The terminal device 120 maybe a 
computer. Alternatively, it may be any other device which can communicate with the 
server in order to access files, such as a PDA, a MP3 player, a cellular phone, a electronic 
gaming system, etc. The server 110 includes at least one memory volume 111 and at 
least one volume manager 112. The terminal device 120 is connected to the server 110 
by wired or wireless communication link 1 30 in order to access data on the server 110. 
The communication line 130 connects to the volume manager 112 in order to access the 
memory volume 111 on the server. Alternatively, the terminal device 120 may include 
its own volume manager 121 for directly accessing the memory volume 1 1 1 on the server 
110. Preferably, the volume manager 1 12 is a software application operating on the CPU 
of the server which provides functionality as discussed below. Alternatively, the volume 
manager 112 may be implemented in hardware or operate on a machine separate from 
that having the memory. 

[048] Fig. 3 illustrates components of a software application providing the functionality 
of the file management system according to an embodiment of the present invention. The 
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file management system includes a user interface 210, a volume manager 220 and a 
coherency manager module. Other software modules may be used and functionality 
described herein as being performed by one module may in some cases be performed in 
whole or in part by another module. The various software modules may be installed on 
each computer or other device which utilizes the file management system o f the present 
invention and on one or more servers or central computers. These software modules may 
operate in conjunction with existing software on those machines. In particular, the user 
interface 210 and the volume manager 220 function in connection with the existing file 
: system on the computer, for example, a Windows file system 25 1 . The user interface 2 1 0 
includes at least one of two alternative components: a set of plug-in extensions 21 1 to 
Windows Explorer 250 (or other such application) and a separate user interface 
application 212. The plug-in extensions 21 1 allow users to access the functionality of the 
novel file management system utilizing familiar formats and displays (e.g., within a 
Windows Explorer or other environment). The user interface application 212 provides an 
alternative interface and may include additional functionality: Also, the user interface 
application can be used for devices which do not include Windows Explorer. 

[049] In one embodiment, a volume is a unit of file storage typically associated with a 
disk partition, or with a Windows 'drive letter'. This embodiment utilizes specific 
memory volumes created for use with the file management system. In some 
embodiments of the invention, a memory volume 1 1 1 within the present invention can be 
a physical volume, residing on a disk partition initialized for use with the file 
management system. In other embodiments, memory volume 1 1 1 may be a virtual 
volume whose data is stored inside a hidden folder on an existing OS volume, such as 
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NTFS 252 in a Windows file system 25 1 . The volume manager 221 manages the 
contents of one or more memory volumes 111. 

[050] The volume manager 22 1 may be enabled for network access. A proprietary 
protocol is used to communicate with the volume manager 22 1 . Fig. 4 illustrates the 
components of a file management system enabled for network access. A TCP/IP 
connection is used to communicate with the various components operating on the 
memory. The volume manager 221 connects to a client over a TCP/IP connection, using 
a unique file protocol. A Windows file protocol 254 may be used to communicate with a 
Windows file sharing application 253 for control of data not within the file management 
system of the present invention. The protocol may be implemented in Extended Markup 
Language (XML), with variations and enhancements that include HTTP, Java Remote 
Method Invocation (RMI) and raw binary streams. The protocol stream may be 
compressed and/or encrypted. A group of servers may be used to replicate the same data 
and appear to users as a single server, to provide high availability and improved 
throughput. 

[051] The volume manager 221 operates on the memory volume 1 1 1 to provide certain 
functionality. The user interface 210 allows a user to access the functionality. The , 
volume manger 22 1 is able to provide the functionality through specific control of 
information in the database relating to the memory volume 1 11 and through 
synchronization and linking processes. The functionality of the volume manager 221 is 
described below. 
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[052] According to one embodiment, the volume manager 221. may create live copies of 
files. A file named A can be live copied to a file named B, and then either file A or file B 
can be live copied again to a file named C. The underlying data referenced by the three 
different filenames is the same. So a change to any one of the files will result in those 
changes being immediately visible through any of the live copies. However, deletion of 
one copy does not delete any other copies. The live copies are associated in the database 
of the volume manager 221. 

[053] According to one embodiment, the live copies can be located in different folders. 
Thus, multiple copies of files can be organized in different mariners while maintaining 
the same content. Since all files are managed by the volume manager 221, live copies 
also can be located in different volumes. Additionally, live copies are not limited to files. 
Folders may also be live copies. A folder named X can be live copied to folder named Y. 
Thus, folder X and folder Y would reference the same underlying data object. This has 
the effect that changes to folder X would immediately become visible through folder Y. 
This includes adding new files to the folder, renaming files included in the folder, or 
deleting files from the folder. 

[054] The volume manager 221 saves disk space arid gains performance by utilizing 
deferred copies. According to one embodiment, when a "regular" copy is made of a file 
or folder, the file or folder's contents are not immediately duplicated. Only a small 
amount of additional disk space is needed for the information in the database regarding 
the new files or folders. Both copies share the same data. Only after the data in one of 
the files is modified, does the volume manager 221 create separate data. The same 
applies to copies of an entire folder hierarchy: only when files are modified in one or the 
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other copy does the volume manager 221 actually allocate space for the new, modified 
copy. ! 

[055] According to one embodiment, the user interface 210 can be used to tell the 
volume manager 221 to freeze a file. Once a file or folder is frozen, no one, not even a 
super-user or administrator, can modify or change the state of that file or folder. Thus, 
frozen files provide a snapshot of the file as of the indicated time. Furthermore, every 
file, including those that are frozen, has an inspectable cryptographically-strong hash 
code (using the SHA-1 hash algorithm, for example), The hash code can be used to 
verify that the content really is intact, and that no error or hackery has changed the 
content. The hash code may also be used for digital signatures. 

[056] A File's hash code can also be used to identify identical content. According to one 
embodiment, the volume manager may identify files with identical content, and link them 
together as deferred copies, thereby allowing the duplicate disk space to be freed. 

[057] According to one embodiment, the frozen file feature provides a simple 
mechanism to maintain prior versions of files. Utilizing a version save command in the 
user interface 210, a deferred copy of the file is created and frozen so it will never 
change. The frozen file is then identified in the database as a past version of the file. A 
past version of a file can be accessed to copy, link to or move it. However, it cannot be 
modified. When a version is saved, the volume manager 221 may also store additional 
information about the version, such as when and by whom it was saved. Also, comments 
about the version can be entered and saved by the volume manager 221 . In a similar 
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manner, a folder can also be saved, which preserves a frozen copy of everything in the 
folder, 

[058] Because information about associated files, such as versions, is stored in the 
database, accessing associated files is simple. A "show versions" option can be selected 
in the user interface 210. In some embodiments, a window will then display all extant 
past versions, which are all frozen. Any of the prior versions can be moved, opened, 
compared to other versions, or otherwise manipulated without changing the content of the 
version. Since information is stored about the timing of versions of all files, the volume 
manager 221 can provide a view of a folder hierarchy or an entire volume as of a given 
time. All of the parts of that view are prior frozen versions. 

[059] A similar information for copies of files may also be maintained. A "show 
copies" option may be selected from the user interface 210. In some embodiments, a 
window will then display a copy pedigree for a particular file. Such a copy pedigree may 
include all predecessor files, all descendant files, or some combination. As with versions, 
any of the copies can be moved, opened, compared to other copies, or otherwise 
manipulated without changing the content of the copy. Since information is stored about 
the timing of copies of all files, the volume manager 221 can provide a view of a folder 
hierarchy or an entire volume as of a given time. This allows users to view the migration 
and evolution of a particular file as well as identify the source of the particular file. 

[060] Every time changes are made to files, the volume manager 221 records what was 
done. When a file is copied, moved, deleted, or saved a record is made. The system can 
then provide a history of any item, which shows where this item was copied from and to, 
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who did it, when, and why. For a folder, the history includes a list of items that used to 
be in the folder but which were either deleted or moved from the folder. From the history 
list, items that have been moved or deleted can be restored, brought back to the folder, or 
copied back to the folder. 

[061] The volume manager 221 also provides linking capabilities. A link is a reference 
to whatever is at the end of the given path. The path can be relative, absolute, or it can be 
a URL. In some embodiments, a link can be "sticky," in that it dictates attributes of what 
it points to. For example, the link can include a reference to a file type (such as a PDF 
file), whether there has to always be something there at the end of the path, and whether 
the link will adjust to point to the new location if the referent moves. A link can be 
configured to behave like a Mac OS alias, Windows shortcut, or Unix symbolic link or 
hard link, appropriate to the platform from which it is accessed. A link can also be 
configured to keep a cached copy of whatever was there the last time the link was used, 
for example, a web page or a folder on a web site. 

[062] The volume manager 221 also provides functionality with respect to folders. One 
type of folder implemented by volume manager 221 is a query folder. A query folder can 
be created which encapsulates a set of search criteria and includes real-time-updated 
results of the search. The search can be a full-text search across one or more volumes, or 
it can be a tag search. 

■ ■ ; ' ■ ■ ■ : v 

[063] Query folders are stored in the volume manager 221 like ordinary folders. 
However, their uniquely formatted name or a special tag attribute indicates to the system 
that they are query folders and not regular folders. At the time that a query folder is 
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enumerated, the query is processed, and the selected files are listed as being the content 
of the folder. In addition, when a new file is created, or when one of the tags associated 
with the query folder changes, the query is evaluated again, and an event is delivered to 
the client to indicate that a file should be added to or removed from the query folder. 

[064] Another type of folder implemented by volume manager 221 is a merge folder. A 
merge folder includes items from a 'merge list 5 of other folders. An item in a folder in 
the merge list hides a like-named item in a folder farther down in the merge list. The 
merge is real-time, not a snapshot; as things appear and disappear in the merged folders, 
they appear and disappear in the merge folder contents. A merge folder can be 
configured to allow creation of new items in the merge folder so that they reside in the 
first folder in the merge list. A merge folder can also be configured to allow deletion of 
items from where they reside or merely to hide them from appearing in the merge folder. 
Items from the source folders appear in the merge folder as live copies. A combination 
of query folders and merge fo lders can be used to implement complex queries. 

[065] Merge folders are also stored in the volume manager 221 . The underlying 
"source" folders know about each merge folder they are used by, and are also referenced 
by the merge folder. This allows the system to propagate changes in the source folder to 
the merge folder. The system can also warn the user about a potential conflict before a 
source folder is deleted. The merge folder also includes a list of edits that are applied to 
each of the source folders. If a file is deleted from a merge folder, for example, an edit is 
stored so that after the contents of all referenced source folders are collected, the edit list 
is applied, and the deleted file is removed from the enumeration before the final list is 
passed back to the user interface 210 for display to the user. 
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[066] One aspect of the invention provides version control. A folder can be designated 
as a "Repository." In one embodiment, a repository folder requires that changes be made 
by doing a "drag-update" to the top-level repository folder itself - other changes to its 
contents (i.e., a piece at a time) are not allowed. To "check out a copy," a user makes a 
"regular" copy of the repository folder. Because of deferred copies, this operation is very 
fast. Users make whatever changes they need to make anywhere within in the copy of 
folder. Then the copied folder is dragged and dropped back to the repository folder. The 
user interface pops up a "check in" window that asks the user to include a note about the 
changes that were made. During the check-in process, the volume manager compares the 
version history of the new files with the versions that are already in the repository. This 
comparison allows it to identify conflicts. The user interface compare-and-merge tools 
are used to resolve any conflicts that may have arisen as a result of another user checking 
out the same hierarchy and changing any of the same files. 

[067] The file management system of the present invention allows folders, as well as 
files, to have type. The type is stored in the database with the appropriate folder 
information. A type can configure a folder to limit what can be in it and to optionally 
require certain contents. For example, a 'Group Role' folder is allowed to include only 
'User' files and 'Group Access' folders, as discussed below. 

[068] The listing of items in a folder is greatly enhanced by the file management system 
of the present invention. Any of the additional information stored with respect to files 
can be saved. Furthermore, special orderings of files can be used in displaying a list. 
The items in folders can be sorted by their name, size, modify time and certain other 

information, as in most file management systems. However, the user can also configure 

i 
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the user interface 210 to display tag names and values associated with the files in a 
folder. When the folder display is in this mode, the tags appear as column headings, and 
the tag values appear in those columns. The files can then be sorted based on those tag 
values, by clicking on the tag name at the top of the column. This is implemented in the 
user interface 210 as an extension to Windows Explorer known as a "Namespace 
Extension." The extension is told the name of the folder that it should display. It then 
sends a request to the volume manager 221 for a list of all of the tags used in that folder, 
and the value of each tag for every file in the folder. It uses that information to render the 
user interface 210 as described above. 

[069] The system can also display the date and time when an item was added to a 
folder, not just when it was created. 

[070] When applied on a network, the file management system is able to cache files for 
improved access while maintaining control. When a server volume is accessed, the 
volume manager 221 on the client creates an entry for the server volume in its local disk 
cache. From then on, even if disconnected from the server, the client can change 
anything that appears to be on the server volume, using whatever is cached locally. The 
system can also ensure that certain files from the server volume are always cached on the 
client, in case the client is disconnected or the server goes down. If a user wished to 
always have an item available, the "keep local" option is selected from the user interface 
210. For a folder, all of that folder's contents, recursively, are affected when the "keep 
local" option is selected. If a user also wants to protect against the item being deleted, 
they should make a live copy. The client volume manager and the server volume 
manager work unobtrusively in the background together with the coherency manager to 
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ensure that 'keep local' items remain in sync with the server. If the client is disconnected 
from the network, the coherency manager will orchestrate synchronization of the volume 
manager with the client cache upon reconnection. If changes have been made in the local 
cache while disconnected, there maybe conflicts with changes on the server. In this case, 
the user interface 210 will work with the user to reconcile the differences. This is done 
in part through a set of compare-merge tools that are integrated into the user interface 
210. These tools allow the user to visualize the changes, and to either select the right 
version or merge changes from one file into another. 

[071] Since information about all changes to files and folders is maintained by the 
volume manager 221, undoing actions is fairly simple. The "Undelete" option in the user 
interface 210 first provides a listing of deleted items. While files are still deleted, they 
can't be viewed or modified. When the desired file or folder is selected, the undelete 
command from the user interface 2 1 0 makes it viewable and modifiable again. 
Similarly, the same process can be used to reinstate a previous version of a file from a 
version listing. Also, the various actions taken with respect to a file or folder can be 
viewed and be reversed with the "undo" option. 

[072] Any change to a file or a folder is an event that can trigger another action by the 
file management system. Many pre-programmed actions can be selected and customized 
with drag and drop and form-fill-in actions. Actions can also be programmed as one 
would in a spreadsheet, using JavaScript, Java, or Visual Basic. The system can react in 
real time, similar to a recalculation of a spreadsheet when a cell is changed. 

[073] In some embodiments of the invention, every item in the memory volume has 
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tags. A tag is a coupling of a tag type and a tag value. There are many built-in tag types, 
such as text, user, date, and icon, A tag can be added to an item, perhaps creating a new 
tag type in the process, and its value can be modified (except for some built-in "system" 
tags). 

[074] An email integration package allows email messages to be brought into the 
system to be manipulated as files in folders and also to be associated with files and 
folders. To determine whether there has been any email discussion about a file, right- 
click on the file and select the "Messages" command. The user interface will then 
provide the email history associated with this file. By clicking the "New Message" 
button on the window toolbar, the user may select the people to whom they want this 
message to go (the system knows who's participated in the discussion so far). The user's 
usual email application (such as Microsoft Outlook) opens up with a new message in it, 
and in the body of the message there is a special URL with a special protocol (such as 
"itc://") that refers to the file being discussed in the email. 

[075] Because the present invention is a peer-to-peer system, any user of the system 
reading the messages including "itc://" URLs can navigate easily from the message to the 
referenced file — not a copy, but the identical file in the space shared by the peers. 

[076] In fact, the URL in the message refers to a specific version of the file, the version 
that was current when the email was written. If the URL is opened, the user interface 
brings up a Windows Explorer window to the folder that includes the file, selects the file, 
and opens a "choices" window. The choices window offers to show other emails about 
the file, to show the file as it was when the email was sent, or if the file has been revised 
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since then, the system shows the version history and allows a selection between the 
URL's version and the current version and offers to show a comparison of the two 
versions. 

[077] The system provides access control through use of management folders. In one 
embodiment, every volume has a management folder with two subfolders: users and tags. 
The file management system grants access to an item (file or folder) based on who the 
user is and the groups to which the user belongs. There are three kinds of typed folders 
found in the users subfolder: "group", "volume group", and "group from authentication 
server" (the latter two are subclasses of folder type "group"). These folders can include 
other group folders and special files of type "user". 

[078] The system may rely on one or more designated outside authorities to authenticate 
users. This authority can be the local computer, a Windows Active Directory server, a 
Kerberos server, LDAP, etc. For every authentication source, there is a corresponding 
typed folder of type "volume group." For each user authenticated by that source, there is 
a corresponding user file in the folder. The user file is an XML file that includes 
authentication source information and user details, such as full name, phone numbers, etc. 
For each group maintained by the authentication server, there is a typed folder of type 
"group from authentication server" in which there are live copies of all the users that are 
members of the group. For example, if the system has been configured to use the 
Windows domain Active Directory server called CORPORATE, the users area might 
include these: 

/users/corporate/Ron 
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/users/corporate/Jane 
/users/corporate/Fred 
/users/corporate/admin/Fred 

[079] The /users/corporate/ folder (which is a typed folder of type "group of 
authenticated users' 5 ) and everything under it includes information that identify the 
CORPORATE Windows domain as their source. The /Users/corporate/admin/ folder is a 
typed folder of type "group from authentication server", and the user file Fred in it is a 
live copy of /users/corporate/Fred (because files represent the same data). A typed folder 
of type "volume group" is a convenient way to establish groups using the user interface. 
These groups are known only to the system, not to the authentication source. They can be 
useful because they allow groups within groups. 

[080] An authentication group folder is special in how it treats the user files and group 
folders included in it, and it allows only those types of items in it. Unlike traditional 
systems, the present invention allows a group to include other groups as well as users. 
The live copy feature makes organizing users and groups easy. Each item (folder or file) 
has one or more owners. An owner is a user or group. An owner is allowed to change 
access settings for itself and for other users and groups. 

[081] The system uses event-driven actions extensively, and custom actions can be 
established to do simple but powerful things. For example, the system can notify the 
right people when a work product file is ready for review. Using the event-driven actions, 
complex workflow automation can be easily built into the user's everyday work area, 
folders and files. 
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[082] The system tracks various aspects about the usage of files and folders by users. 
Furthermore, it can be customized to ask for more specific information. Typical 
document management systems are limited because they are not able to control the files 
on users' desktop computers. Users often have to extract files from the document 
management system onto their desktop computer (thereby out of reach and put of the 
control of the document management system) and then back into the document 
management system at some later time. According to one aspect of the invention, files 
never leave the system, 

[083] The present invention eliminates bad copies in a variety of ways. For example, in 
a conventional system, a user may wish to copy an item from a server or a CD-ROM to 
the user's local machine. If the user's purpose for making the copy is convenience, the 
invention provides a sync link from the item on the server to the local volume. If the 
user's purpose is for speed of access, the invention may provide a cached copy on the 
local volume. If the user's purpose is to protect against the server going down or the item 
being deleted from the server or unavailability of the CD-ROM, the invention may 
provide a live copy of the item on the local volume. If the user's purpose is to have 
access to the item when not on the network, the invention provides the keep local feature. 

[084] In other examples, the user may wish to copy an item from the local machine to 
the server or a removable disk. If the user's purpose for making the copy is for backup, 
the invention provides automatic backup to the server. If the user's purpose to publish 
the item for others to access, the invention provides a live copy on the server and 
furthermore may provide permissions to control which users have access. If the user's 
purpose is to capture and maintain a version, the invention provides the snapshot feature. 
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[085] In other examples, the user may wish to copy an item from one folder to another 
folder for organizational convenience {i.e. , have all related files in one folder). In this 
case, the invention provides live copies or alternatively, special folders that have links to 
the various items that should be included therein. ; 

[086] In another example, the user may wish to copy items to a zip file or other archive 
format for reasons similar to those described above. If the user' s purpose is to keep a 
snapshot of a current version of the items, the invention provides the freeze or save 
features. If the user's purpose is to send these items to another user, the invention 
provides a link to the saved version that then can be forwarded to the other user. If the 
user's purpose is to send these items in a zip format, the invention provides an "extract 
as..." folder feature. 

[087] Fig. 5 illustrates a block diagram of an embodiment of file management system in 
further detail. As illustrated therein, file management system 500 interfaces with a file 
system interface 502. File system interface 502 allows file management system 500 
communicate with other system devices (not illustrated) using various protocols. In one 
embodiment of the present invention an SMB protocol interface box may be used. As is 
known, SMB is a standard protocol used, for example, by Windows to implement file 
sharing. With the SMB protocol interface box, file management system 500 appears like 
a network drive to other system devices. As would be apparent, other interfaces could be 
used including those that would support different file-access protocols or that would 
allow file management system 500 to appear as a native file system. 

[088] File system interface 502 provides a standard API that functions to implement 
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standard file system calls, (e.g., read/write, open, close, etc.)- File system interface 502 
passes system calls that it receives from other system devices to a disk adapter 504, 
(sometimes referred to herein elsewhere as a grok adapter) that redirects and implements 
those system calls in accordance with the present invention. 

[089] In one embodiment of the present invention, disk adapter 504 implements system 
calls or "requests" such as those illustrated in request block 506. These requests include: 
"list" which is used to enumerate a folder; "stat" which gets information about a 
particular file such as size, type, etc.; "mkdir" which creates a directory; "delete" which 
deletes a file, a folder, etc.; "open" which opens or creates a file; and "close" which 
closes a file. These are referred to herein as file system requests. Other requests such as 
"read," "write," "seek," etc., may also be included as would be apparent and are referred 
to as file or "blob" requests. In general, the operation and use of these requests by other 
system devices are well known. 

[090] In one embodiment of the present invention, certain requests and in particular, 
read and write requests, are actually diverted inside disk adapter 504 directly to streams 
that exist on an underlying file system 508. In one embodiment, file system 508 is an 
NTFS-based file system. Other file systems such a FAT file system may be used as 
would be apparent. However, the NTFS files system provides a more robust system with 
some built-in integrity preserving capabilities than does FAT file systems. Furthermore, 
NTFS more readily allows millions of files to be located in a single folder. 

[091] When disk adapter 504 detects read or write requests, they are diverted directly to 
file system 508. In one embodiment, these requests do not pass through the remainder of 
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file management system 500, in part, to avoid processing of large data streams, or 
"blobs," by a transactional database. However, in other embodiments, for example, in 
those that implement a custom object store, these blobs may pass through the file 
management system 500 in order to provide transactional integrity (i.e., all transactions 
, fully complete or fully fail) as will become apparent from the discussion below. 

[092] One aspect of file management system 500 is to manage all of the metadata that 
surrounds that blob as opposed to managing the blob itself. This metadata may include, 
for example, filename, tags associated with a file, a folder in which the file resides, a time 
of its creation, a time of its last modification, etc. In some embodiments, file 
management system 500 may also manage blob creation (e.g., opening a zero length file) 
and deletion. 

[093/ When a request from a file system arrives, disk adapter 504 creates a request 
object that encapsulates any components of the request for operation with a transactional 
database. In some embodiments of the present invention, this encapsulation allows file 
management system 500 to be fully asynchronous in that it allows request objects to be 
queued for subsequent completion without tying up system operation. In some 
embodiments, disk adapter 504 creates a different request object for each type of 
incoming request. In one implementation, each request ("list," "stat," "mkdir," etc.) 
corresponds to a subclass of the base class "request." 

[094 J For example, a "mkdir" request object would encapsulate all of the parameters for 
the mkdir request including a name of the directory to be created and a user name 
associated with the person requesting the creation. The request object is then passed to a 
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system call dispatcher 507. System call dispatcher 507 passes the request object to a 
thread pool 510 to be executed. Thread pool 510, in turn, wraps each request object or 
each action associated with the request object inside a transaction for use with the 
transactional database. 

[095] In one embodiment, thread pool 510 includes a parallel set of objects derived 
from the transaction wrapper. These parallel objects are referred to as task objects. They 
are derived from another class of objects referred to as a transaction wrapper object. 
Thus, system call dispatcher 507 passes the request object to the task object which is then 
handed off to a thread pool to be executed. One aspect of this embodiment is that the 
task objects may sit in a queue while awaiting processing by thread pool 510. As would 
be apparent, thread pool 510 also provides a mechanism by which file management 
system 500 may asynchronously operate, thereby alleviating server overuse and 
providing improved performance by minimizing connections to the underlying object 
store. 

[096] Thread pool 510 grabs task objects one at a time and calls a run method 
associated with the task object as would be apparent. This run method within the 
transaction wrapper handles the object store transactions. More particularly, the run 
method calls a do transaction method, which is overridden inside these task objects. In 
this way, each of task objects does not require all of the external wrapper code that knows 
how to manage the transactions. The particular task object performs its specific task, 
(e.g., creates the directory by doing the appropriate object manipulations) and then 
returns. So the transaction wrapper creates or starts a transaction, calls its specific 
do transaction method, and then calls the commit transaction routine. 
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[097 J When two tasks or threads attempt to modify the same object(s), the transaction 
database will detect it and prevent the transaction from succeeding by throwing an 
exception. The transaction wrapper manages those exceptions, by for example, 
reattempting the transaction some number of times. In one embodiment, if the 
transaction continues to fail, the exception manager attempts to obtain exclusive access to 
the database thereby blocking out any other transactions while it completes the 
transaction. 

[098] Before discussing each of the task objects in further detail, a volume manager 
object 515 and an object store 520 are described. According to one embodiment of the 
invention, volume manager object 515 manages much of the non-persistent data that's 
associated with volume 525, while volume 525 stores the persistent data. 

[099] When disk adapter 504 is first initialized, it receives a volume name representing 
a volume 525 and is instructed to initialize volume 525. Next disk adapter 504 opens 
volume 525 in similar fashion to a convention file system mount command, by calling 
volume manager object 515. During this initialization, disk adapter 504 calls a static 
method inside volume manager object 515 to ask for an instance of volume manager 525 
associated with the volume name. The static method either returns an existing volume 
manager object or creates one and initializes it. If the volume manager object exists, it's 
just looked up in a hash table by the volume name and returned. If not, the volume 
manager goes out to the database, establishes a connection to the object store 520 and 
does a lookup to see if a volume object has been stored there. If it has been stored in 
object store 520, then that volume object is read in and stored in the volume manager. So 
where the volume object has been previously created, mounting comprises either reading 
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that volume object or getting a reference to that persistent volume object from the object 
store and storing a reference to that volume object in the volume manager. 

[0100] In one embodiment, object store 520 corresponds to an object store. In this 
embodiment, since each object reference is owned by a particular session, it is not 
possible to pass a standard reference to an object from one session to another. In this 
embodiment, object store 520 provides a mechanism referred to as a shared object 
reference that allows access to these persistent objects with references unique to each 
session. After the volume manager 515 mounts the volume 525, a reference to the 
volume 525 is stored in a shared object reference in the volume manager 515. 

[0101] When the volume object does not already exist in object store 520, volume 
manager 515 creates volume object 525, causes it to be initialized, and stores it in object 
store 520. When volume 525 is initialized, a root slot is created along with a root folder 
and a number of folders and tags associated with a tag volume. 

[0102] Volume manager object 515 also manages access to sessions of object store 520. 
In one embodiment, a read/write lock is created and anchored in the volume manager. 
Any class in file management system 500, for example, transaction wrapper 510, starts a 
transaction by calling a method in the volume manager to begin the transaction. More 
particularly, the volume manager includes transaction begin and transaction commit 
methods. When the transaction begin is called, the volume manager must acquire a read 
lock before it calls the underlying object store begin transaction method. 

[0103] A read/write lock provides for multiple readers. So while multiple read locks can 
be acquired, only one write lock can be acquired. This lock operates as follows. When a 
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write lock acquire is called or issued, it suspends or waits until all read locks have been 
released. Subsequent read lock acquires that arrive after the write lock acquire is called 
are suspended until the write lock acquire completes and the write lock release completes. 

(0104J In one embodiment of the invention, a read lock is acquired in the transaction 
begin method and the read lock is released in the transaction commit method. In this 
way, multiple threads and multiple sessions are allowed to be active at the same time. 
However, to accommodate instances where a write conflict occurs such as described 
above, retry logic is incorporated into the transaction wrapper. Thus after trying and 
failing to execute a transaction multiple times, the transaction wrapper calls an exclusive 
begin method inside the volume manager that calls a write lock acquire on the lock object 
that's used for the normal transactions. This has the effect of letting all of the normal 
transactions that are in progress complete, at which point in time, that session gains 
exclusive access to the database, and it can then complete its transaction without fear of 
interference from other sessions. 

[0105] As mentioned above, one embodiment of object store 520 may comprise an object 
store. In this embodiment, object store 520 stores Java objects in a persistent store on 
disk using a sophisticated caching and persistence mechanism. Object store 520 allows 
for multiple sessions with each single session having a consistent view of the database. 
As a session begins a transaction, object store 520 creates a snapshot of the database that 
remains consistent until the end of that transaction. When the transaction commits, all of 
the objects changed by the transaction are written to the database in an atomic fashion 
using logging mechanisms for recovery or rolling back. 
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10106] In one embodiment of the invention, the volume manager provides in general a 
one-to-one association between threads and sessions. Because each session has a 
consistent view of the database, it cannot damage some other session. 

[0107] Most of the task objects discussed above include a path name as an input. One 
function the file management system 500 performs is to map conventional path names 
{e.g., c:/folder/subfolder/file.doc, etc.) into database objects of various kinds. The 

volume manager 515 parses the path name and performs various table lookups to identify 

■ . . -' - ■ . r • 

a node object. The volume manager begins at a root object anchored in the volume 
object and "walks" the graph of objects from the root down to the node object. The 
objects that the volume object is walking through while parsing are illustrated in Fig. 5 as 
file system data structures 530. 

[0108] File system data structures 530 derive from a super class called file system node, 
or FS node, and include a slot object 532, an entry object 534, and an item object 536 that 
includes a container object 537 and a stream object 538. These objects in file system data 
structure 530 represent files or other data structures that reside on a physical disk. 

[0109] Slot object 532 manages a name of a file or a folder. Entry object 534 manages 
tags and attributes. Tags are described in detail below. Attributes describe whether the 
file is frozen, read only, etc. Container object 537, which corresponds to folders, 
manages all of the data structures associated with a folder. Stream object 538, which 
corresponds to blobs, manages all of the objects or all of the items or all of the pieces of 
data associated with a blob including, for example, the name of the blob on the native file 
system. . 

•r 
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[0110] In one embodiment of the invention, each file or folder corresponds to a triple 
including a slot 532, an entry 534 and an item 536. More particularly, each file 
corresponds to a triple of a slot, an entry and a stream 538, while each folder corresponds 
to a triple of a slot, an entry, and a container 537. The objects forming a triple are linked 
together in various ways to achieve some of the aspects of the present invention including 
live copies and deferred copies. 

[0111] Container 537 allows file management system 500 to map path name components 
into slots 532. In some embodiments, container 537 also includes information about 
whether or not deleted files should be shown when the folder is enumerated. In other 
embodiments, container 527 identifies a type of the folder, for example, whether the 
folder is a normal folder, a query folder, or a search folder. Container 537 may also 
include maintenance data that takes a file or folder name and maps it to a slot to facilitate 
certain types of lookups. Container 537 may also include methods within the container 
class that, for example, enumerate the folder 

[0112] Stream 538 is relatively simple by comparison to container 537. In one 
embodiment, stream 538 includes a string that identifies the name of the file on the disk 
in file system 508 where the actual blob resides. Stream 538 may also include a hash ID. 
In one embodiment, this is a cryptographically strong hash of the contents of the file. 
Each time a file is modified, this hash value is recalculated, to allow the tracking of 
identical files according to the invention. 

[0113] Entry 534 manages any tags that are attached to a file. Since multiple slots 532 
can refer to the same entry 534, the entry object also includes a list of all of the slots 532 
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referring to that entry 534. This may occur, for instance, with hard links. Entry 534 may 
also include a reference to the underlying item 536, and references to a revision chain 
(e.g., the previous version to this one and the next version). According to one 
embodiment of the invention, each entry 534 lives somewhere on a revision chain — it 
may be the only object on that chain or one of many. In some embodiments, the revision 
chain is linear. In other embodiments, the revision chain may include branches that may 
allow an entry to reside on any number of revision chains. In further embodiments, a 
similar mechanism may provide for a copy history that records where this entry was 
copied to, where it was copied from, etc. Each entry 534 may also include one or more 
attribute flags including a frozen attribute, a repository attribute, a free text indexer 
attribute, and a read only attribute, 

[0114] Entry 534 also manages a hash table that maps tag names to their corresponding 
data structures as will be described in further detail below. Entry 534 may also include 
methods for manipulating revision lists, for setting tags, for removing tags, for copying 
tags to another entry, and for updating dynamic folders. 

[0115] File management system 500 also includes a tag object 540. Tags correspond to a 
name/value pair that is associated with either a file or a folder. As discussed above, entry 
534 is the primary object to which tags are attached. Because both files and folders have 
an entry object, they can both have tags. According to the invention, tag look-ups are 
used many different places and for many different reasons in the system. As a result, 
their implementation required speedy operation. In order to provide the necessary speed, 
in one embodiment of the invention, all tag names are stored in a large bi-directional hash 
table. In other words, the hash table allows the identification of all objects that have a 
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particular tag associated with them as well as the identification of all tags associated with 
a particular object. 

[0116] In one embodiment of the invention, a hash table is anchored in the volume object 
525, and is used to look up all tag names. This hash table receives a tag name and returns 
a single name holder object 541. Name holder 541 includes the name of the tag and a set 
of all of the associated value holders 542 for that name. Value holder 542 includes the 
value of the tag. In other words, name holder 541 includes the name of the tag and value 
holder 542 includes the value of the tag. In one embodiment of the invention, a single 
name can be associated with many values. 

[0117] Tags can be attached to either entry objects 536 or slot objects 532. Tags that are 
attached to an entry object are shared by all slots linked to that entry. When referenced 
with respect to tags, slots and entries; together are referred to as taggable objects. Tags 
attached to a slot are visible only for that slot. File names, for example, may be stored as 
slot tags, since they are different for each slot. File type and file size may be stored as 
entry tags, since they do not change based on the name of the file or the folder in which it 
is located. Slot tags are identified by the prefix "slot." For example, "slot.name" 
includes the file name. Most other tag names are attached to entry objects. 

[0118] Each value holder 542 includes a value and a reference to a collection of taggable 
objects (entry objects 536 or slot objects 532) that share that same name/value pair. This 
allows file management system 500, then, to easily and quickly determine which entry or 
slot object is associated with a particular name/value pair by iterating over the set of 
value holders held by the name holder. In addition, this allows all of the entry or slot 



42 



Attorney Docket No. 25396-003 

objects that are associated with a particular tag or any value of a particular tag to be 
determined. 

[0119] Using these data structures, a given tag name may be associated with multiple tag 
values at the same time for each entry. For example, while it is intuitive that a name can 
have one value for one file and a different value for a different file, a single tag name can 
also have multiple values for the same file. 

[0120] To accommodate a reverse process, a hash table is anchored in taggable objects, 
whose keys are tag names, and whose values are sets of value holder objects for each of 
the values that is referenced by that taggable object. This allows file management 
system 500 to identify all of the tags that are associated with an entry or slot. More 
particularly, the value holder object has a reference that points back to its corresponding 
name holder. So from a taggable object, all of the value holder objects can be determined 
which provides the values of the tags, and from those, the tag name and other files with 
the same tag name can also be quickly identified. 

[0121] In addition to tags, file management system 500 includes mechanisms for causing 
side effects to normal file system operations. These mechanisms are referred to as 
triggers. In one embodiment of the invention, a trigger 545 is implemented around 
various requests. The triggers can be invoked before and/or after each of the various 
requests, for example, to veto the operation, to indicate or record that the request either is 
about to happen or just completed, or to cause various more complex actions to take 
place, such as setting tags or creating new files or performing operations over a network. 
Triggers may also be invoked if changes are made to various tags, either globally 
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(regardless of the file to which the tag is attached) or locally (only when the tag is 
attached to a specific file), as would be apparent. ■ 

[0122] In one embodiment of the invention, trigger 545 includes a close trigger 546 and 
an email trigger 547. When a file is modified and closed, then close trigger 546 is 
invoked. When a file is moved from one folder into another, then email trigger 547 is 
invoked. 

[0123] In one embodiment of the present invention, when close trigger is invoked, it can 
call an external program whose purpose is to determine the MIME type of the file. 
Volume manager 515 makes an initial assumption about the type of the file based on its 
file extension, based on a list that maps an extension string to a human-readable file type, 
and another list that maps an extension to a MIME type. However, if a file's extension is 
not in those lists, the close trigger will call an external program that opens the file, reads 
the first few bytes, and, based on a set of rules, determines what the MIME type of the 
file is. 

[01 24] The output of the external program is captured and stored into two tags in the file 
management system 500 referred to as system tags. System tags differ from other tags in 
file management system 500 in that they cannot be directly modified by users of file 
management system 500. According to one embodiment of the invention, system tags 
start with the keywords "sys," or "slot.sys" for slot tags. Thus, "sys.mime" and 
"sys.type" include the MIME type information - the actual MIME type is included in 
sys.mime and a human readable version of the MIME type is included in sys.type. As 
thus described, these two system tags are determined when the close trigger is invoked. 
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[0125] In some embodiments of the invention, when the close trigger is invoked, a 
request is queued for a cryptographic hash to be computed for the file. As this 
computation is both GPU and I/O intensive, it is queued for subsequent background 
processing so as to not delay the close operation as would be apparent. In one 
embodiment, a single background thread is used for computing these hashes. 

[0126] In a similar manner, the close trigger may also queue a request to index the file. 
Indexing the file facilitates free-text search of the contents of that file. In one 
embodiment of the invention, file management system 500 integrates with a third-party 
free-text search engines referred to as Lucene, though other engines could be used as 
would be apparent. Indexing may also be done by a single background thread. 

[0127] When an email trigger is invoked, an email may be sent to a user based on various 
tags that are attached either to a file (for example, to send an email when the file is 
modified), or that are attached to a particular tag (for example, to send an email when the 
tag is modified). In some embodiments of the present invention, the contents of the email 
are static. In other embodiments, the contents are fully configurable based on other tags 
that could be read either from the file itself or from the tag volume. 

[0128] When the email trigger is invoked, it evaluates various conditions and determines 
whether to send an email. For example, if a file is being dragged into a folder, the email 
trigger may be invoked. The email trigger would determine the parent folder associated 
with the destination of the file and determine whether the tags on that folder indicate that 
an email should be sent. If so, in one embodiment of the invention, the email trigger 
includes code to connect to an email server (whose IP address is specified in a specific 
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tag) and to deliver an email thereto. 

[0129] Different triggers may be called based on different system events, as have been 
described. The name of the trigger may be specified in a tag. When the file management 
system 500 executes the trigger, it dynamically loads the trigger software, and calls it 
according to a predefined interface. In one embodiment of the invention, the triggers 
may be Java class files; a Java class loading mechanism is used to load the software; and 
a Java interface is used to specify the standard calling conventions. For example, a file 
"file.txt" may have a tag called "trigger.tag.my.tag" set to the value "My Trigger." In this 
example, whenever the tag "my .tag" for "file.txt" changes to a new value, file 
management system 500 loads a Java class called "trigger.MyTrigger" and then uses the 
"Trigger" interface to invoke that code. 

[0130] As mentioned above, the invention provides for placing tags on tags. In one 
embodiment of the invention, this is implemented using a tag volume where all tags in 
file management system 500 are reflected as folders. In this embodiment, the tag volume 
itself corresponds to /volume root/tags/ and tags in file management system 500 descend 
from this folder. For example, if you have a tag referred to as "sys.tag," within the tag 
volume, it would be reflected in the filesystem as a folder called /volume 
root/tags/sys/tag. According to one aspect of the invention, "dots" in the tag name are 
replaced with "slashes" and appended onto a prefix for the tag volume. Each time a new 
tag is created, a corresponding folder under that prefix is also created. 

[0131] However, deleting a tag from a file, even if it's the last occurrence of that tag 
anywhere in the system, does not remove the corresponding folder from the tag volume. 
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This allows users to construct a tag naming convention hierarchy (taxonomy) regardless 
of whether those tags are used. The notion of applying a tag on a tag, sometimes referred 
to as meta-tagging, is implemented within this tag folder hierarchy. As discussed above, 
tags on tags or "metatags" may used to describe various attributes about a tag. In one 
embodiment of the invention, metatags are applied to the sys.file tag by using the 
previously described mechanisms to apply tags to the folder that corresponds to the tag in 
the tag volume. For example, to apply the "tag.type" metatag to the tag called "sys.tag," 
the folder /volume root/tags/sys/tag would be located or created and the "tag.type" tag 
would be applied to that folder. 

[0132] Another aspect of the tag volume is that when a folder is deleted from the tag 
volume, the corresponding tag will be deleted from every file with which that tag is 
associated. A similar mechanism may be used to rename tags. 

[0133] In some embodiments of the invention, attached to the tag nodes in the tag 
volume is a list in the form of a multi-valued tag. This list includes all of the values that 
are associated with that multi-valued tag, as well as markers (in the form of other 
metatags) indicating whether or not additional values are allowed. 

[0134] File management system 500 includes a stream transaction block 550 that 
includes a hash transaction object 551 and an index transaction object 552. These objects 
include requests that are placed on the hash and index queues, respectively, that were 
described above. These objects and their corresponding queues are persistent to maintain 
consistency of files and file modifications and to facilitate recovery from server crashes. 

r' 

[0135] In one embodiment of the invention, requests are added onto a queue by one 
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session and pulled from the queue by another session. But as described above, each 
session has a unique and consistent view of the object store. Thus, one session viewing 
the queue within the context of an object store transaction does not see another session 
updating the queue. Once initiated, then, the hash transaction and index transaction 
objects would not see new requests entering the queue. In some conventional systems, 
these objects would periodically abort their session thereby updating their view of the 
object store, in order to see if new requests have arrived. This is a very inefficient 
solution, 

[0136] According to one aspect of the invention, this problem is overcome by using a 
parallel non-persistent semaphore to manage these objects and their respective queues. 
When volume 525 is mounted as described above, volume 525 determines a number of 
objects within each queue. For each queue, volume 525 releases a corresponding number 
of semaphores. As threads may only acquire as many semaphores as have been released, 
when a thread attempts to acquire a semaphore object and none are available, the thread 
waits until some other thread releases the corresponding semaphore. 

[0137] When, for example, a hash transaction thread begins, it first attempts to acquire a 
semaphore object. If the thread acquires one, it knows that there must be a corresponding 
object in the persistent queue. The thread may then join an object store session and start 
an object store transaction. The thread then safely pulls an object off the queue and 
begins processing it. 

[0138] Correspondingly, after a new object is placed onto the queue and the 
corresponding transaction is successfully completed, the thread that placed the object 
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onto the queue releases the corresponding semaphore. 

[0139] The semaphore mechanism thus described is important because typically, object 
store 520 does not allow one session to synchronize on objects used by another session 
for this kind of "thread-to-thread" synchronization. If fact, some object stores throw an 
exception when that occurs in order to facilitate each session's unique and consistent 
view of the database. 

[til 40] Once an object is pulled from the queue, hash transaction object 551 reads the 
corresponding file and passes the data to a routine that computes a hash code. In one 
embodiment of the invention, this hash code is a SI IA-1 hash code implemented in Java 
as is known. 

[0141] According to one aspect of the invention, once determined, the resulting 160-bit 
hash code is encoded into a relatively human-readable character string. In one 
embodiment, the hash code is encoded into a 3 5 -character string. In this embodiment, 
every five bits of the 160-bit hash code encoded as an ASCII character. The five bits 
correspond to a 32 values from the ASCII character set, namely: 
{0,l,2,3,4,5,6,7,8,9,a,b,c,d,e,f,g,h,i,j,k,n,p,q,r,s,t,u,v,x,y,z}. As noted, four of the 
traditional characters from the alphabet were excluded: 1 ) ' w' because its pronunciation 
has multiple syllables and thus takes longer to say; 2) 'o' because it is often confused 
with zero; 3) 'm' because it is confused with 6 n'; and 4) T because it is often confused 
with one. This encoding results in a readily readable string for customer support 
purposes, for example. 

[0142] The encoded string is stored into a tag whose name is passed as parameters to the 
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hash transaction object: In one embodiment, this tag is referred to as "sys.hash.sha-1" 
and a request to recompute the hash code is queued whenever a file is modified. 

[0143] Index transaction object 552 pulls an object from its queue and constructs a 
request for an external indexing program 555 to index the corresponding file. In one 
embodiment, this external indexing program is a third-party software package referred to 
as Lucene. Other indexing programs are available arid could be used as would be 
apparent. The external indexing program receives the contents of the file and some 
metadata such as the date the file was modified, for example. In one embodiment of the 
invention, indexing is performed for only two types of files: text files arid HTML files. 
These files are comprised of a stream of words readily processed by the external indexing 
program. In other embodiments of the invention, a prefilter first converts binary files 
(such as, for example, PDF files, Word files, etc.) into a stream of words and then passes 
the stream onto the external indexing program. In other embodiments of the invention, 
the external indexing program processes binary files directly as would be apparent. 

[0144] The external indexing program uses a front-end filter 557, referred to sometimes 
as a Grok analyzer 557, that performs various pre-processing steps on the stream of 
words generated from the file being indexed. These steps may include tokenizing the 
stream (determining where the breaks between words are), removing "V (apostrophe-s) 
from the end of words, removing periods from acronyms, converting words to lower case, 
removing common "stop" words (such as "a," "the," "and," "or," etc.) and performing 
standard Porter stem filtering (removing common suffixes such as "-ing," "-ed " etc., and 
mapping double suffixes to single ones "-ize" plus "-ation" maps to "-ize") etc. 
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[0145] In one embodiment, the resulting text index files from the external indexing 
program are stored out in a file system 558 (or files system 508 as would be apparent). 
Accordingly, in this embodiment, these text index files are not transactional^ secure. In 
other embodiments, the resulting text index files are stored in object store 520 as would 
be apparent. 

[0146] File management system 500 also includes a socket manager 580 that is 
responsible for managing incoming connections used as pathways to execute other 
remote commands including XML commands and RMI commands. This mechanism 
provides a parallel or alternate command path to file management system 500 similar to 
that described as system operations through file system interface 502. Socket manager 
580 is to handle XML commands. When a client attempts to connect to the server on a 
specific port, socket manager 580 receives that connection. Socket manager 580 
manages the number of connections, creates socket reader object 571 and socket writer 
object 572, and delegates subsequent read and write operations to the corresponding 
object. In one embodiment, these sockets are full duplex, thereby enabling parallel 
reading and writing as would be apparent. 

[0147] Socket reader object 571 reads the socket, packages each XML command packet, 
attaches it to an object, and places that object onto a queue. Socket writer object 572, on 
the other hand, reads a queue, serializes those objects from the queue, and outputs them 
to the output socket. 

[0148] Socket worker object 565, which run in their own separate thread pools, pull 
requests off of the corresponding input queue, parses the corresponding XML command, 
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determine a necessary action and in some instances, actually executes many of the tasks 
associated with these particular commands. More complex commands may be dispatched 
to appropriate objects that know how to perform those functions. 

[0149] For example, in one embodiment of the invention, commands to manipulate tags 
(i.e., getting tags, setting tags, removing tags, etc.) may enter file management system 
500 as XML commands via socket worker 565 : After parsing the XML command, socket 
worker performs path name lookups, etc.;, that may be required to obtain either a slot or 
an entry object and or to set/remove tags, set/read/remove attributes, etc. 

[0150] Socket worker 565 is also responsible for constructing an appropriate response to 
the client for the requested operation, For example, if the incoming request asked for all 
of the tags associated with a particular file, socket worker 565 would first access volume 
manager 515 and parse the path name associated with the particular file into a slot object. 
Then, using the slot object, socket worker 565 accesses the corresponding entry object. 
The entry object includes methods that, for example, determine which tags are associated 
with that entry object. Using that data, socket worker 565 constructs an XML DOM 
object, which represents the response. Once constructed, socket worker 565 queues the 
DOM object up to the corresponding socket writer 572 associated with the client that 
issued the original request. 

[0151] In one embodiment, the requests are tagged with ID numbers thereby allowing 
file management system 500 to operate completely asynchronously. This allows a client 
to submit many requests, one right after the other, without waiting for the responses to 
come back. Those requests are then queued and subsequently processed by a pool of 
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socket workers. As the requests are completed (and not necessarily in the order in which 
they were received) and responses are constructed and placed on the output queue, socket 
writer 572 sends them out with the same ID marker associated with the original request. 
The client can then correlate the responses with the requests. 

[0152] File management system 500 also includes a notification object 560. At various 
points within the operation of file management system 500, such as when a new file or 
folder is added or when tags change in certain ways, certain events can be generated. 
According to one aspect of the invention, these events may generate XML messages that 
are sent to a client, in some instances, completely asynchronously. In order for the client 
to indicate its readiness to receive these events, the client sends a specific command 
referred to as a watch list command. The client collects the names of folders referred to 
by open windows on the client and forwards that as a watch list to the server. In this 
way, the server now knows which folders every user has open on every connection on 
every desktop. Whenever anew file is created, file management system 500 searches the 
watch lists of open folders to determine if any clients currently have a folder open that 
includes the newly created file. If so, then a corresponding event is sent asynchronously 
to all of those clients. According to various aspects of the invention, this mechanism 
works similarly for regular folders, search folders, and/or query folders. A similar 
mechanism also works for tags where if a tag is changed on a file that is currently open 
on a user's desktop, then that user will receive an asynchronous event saying that that tag 
has been updated. 

[0153] Events may be scheduled to occur when, for example, a tag or file is deleted from 
any one of these open folders, a file is renamed, etc. Various objects in file management 
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system 500 track which socket writer 572 or socket reader 571 corresponds to which 
user. In other words, within file management system 500 there exists a so-called "back 
path" from the watch list of open folders to the user. This back path enhances the lookup 
process, making it extremely fast. In one embodiment, the names of the folders are 
stored in hash tables with the output being a set of socket readers or socket writers that 
correspond to that particular user. Once this set is determined, an XML notification 
message may be constructed and queued for the corresponding socket writer. 

[0154] File management system 500 also includes an RMI interface 582 that operates in 
a mariner similar to socket manager 580, the difference being no XML in the RMI 
procedure call. In one embodiment, socket manager 580 and RMI interface 582 share 
common code {i.e., code exclusive of XML parsing etc.) referred to herein as core calls 
584. Core calls 582 correspond to the common operations between the RMI interface 
and the XML interface. 

[0155] Other functions that may be included in various embodiments of file management 
system 500 may include logging, unit testing, miscellaneous utilities, etc. These 
functions are generally well known and may either be incorporated into the system or 
integrated therewith as third party tools. 

[0156] Another function that may be included in file management system 500 is an ID 
number manager (not illustrated). All file system node objects 530, including slots 532, 
entry objects 534, streams 538 and containers 537, have associated therewith an ED 
number. This ID number is unique on a per-volume basis. In some embodiments of the 
invention, the ED number is used to name the underlying blob on file system 508 that 
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corresponds to this node object. As described above, each stream object 538 refers to a 
blob on files system 508 that corresponds to that stream, and the name of that blob is the 
ID number of that object. 

[0157] In some embodiments of the invention, ID numbers may be used to look up 
objects by their number, for example, with the free-text search index. When a file is 
indexed in the free-text search sense, its file name is not stored in the index. Otherwise, 
any time the file is renamed, it would have to be re-indexed. Instead, the ID number is 
used as the name of the index. When a lookup is performed during a free-text search, the 
returned hits include the ID numbers corresponding to the objects that were found; This 
ID number is used to determine which stream objects and accordingly, which entry 
objects and which slot objects are implicated. From the slot objects, the name of the 
object can be determined. Using ID numbers in the index also facilitates a single index 
file regardless of whether the corresponding file is linked, live copied, a deferred copy, 
etc., as only one instance of that file resides on the disk and thus haying multiple index 
files is unwarranted. 

~> . ■ . 

[0158] ID number manager assigns the ID numbers. According to one aspect of the 
invention, ID numbers are anchored in volume object 525. Because of the manner in 
which object store 520 operates, if each session were to access the volume object for a 
new ID number as the objects were created, a significant number of write/write collisions 
against the volume object would result. Instead, ID number manager operates using a 
single thread to assign the ID numbers. 

[0159] At start up, ID number manager requests a block of ID numbers from the volume 
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object and places them one at a time onto a synchronized queue. While this queue is not 
persistent, the volume number update process is. More particularly, when the ID number 
manager asks for a block of ID numbers, that request is done in a persistent fashion: the 
updated volume object is written back to the object store so that the block that was 
requested is "remembered" if the file management system 500 were to crash. However, 
the queue in which these objects are placed is not persistent. Instead, the ID number 
manager writes only so many of the ID numbers, one at a time, to the synchronized 
queue. Thus, this queue has a limited depth. Furthermore, the ID number manager only 
has a limited number of these objects that it originally fetched from the volume object. 

[0160] In some embodiments, the ID number manager writes a few of these ID numbers 
into this queue and suspends until another thread removes a number from the queue. 
Threads requesting an ID number in order to create file system objects remove a number 
from the queue. In order to overcome problems associated with this queue being non- 
persistent, when the ID number manager has placed all of the ID numbers that it fetched 
from the volume manager on the queue, the ID number manager requests another block 
of ID numbers through an object store transaction. In this way, the volume object need 
only periodically re-persist to disk {i.e., update object store) based on the number of ID 
numbers fetched at any given time from the volume object. 

[0161] The tag volume is now described in further detail. As implemented in one 
embodiment of the invention, tag volume is implemented as a tag folder hierarchy. As 
described above, tags in file management system 500 are reflected into file system as 
folder names. This is done be replacing the dots in a tag name with slashes, and then 
appending the resulting string to the root path of the tag volume. For example, with a tag 
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volume root path of/volume root/tags/" then a tag referred to as "sys.types" would be 
reflected in the file system as a folder named "/volume root/tags/sys/types " 
Furthermore, the folders corresponding to each tag are created at the time that the tags are 
first created. 

[0162] As also described above, each tag can have one or more rnetatags applied to it. 
One purpose of the rnetatags is to affect the behavior of the tags to which they are 
applied. These rnetatags are now described in further detail. 

[0163] Each tag may include a type that is enforced at the time that the tag is set. One 
type of tag is a user type. A tag of user type has a value of the form of domain name/user 
name. Another type of tag is a date type. A tag of date type has an ISO standard date 
form. Another type of tag is an icon type. A tag of icon type must include a value that 
represents the name of an icon file found in the /volume root/tags folder. Another type of 
tag is a hash type. A tag of hash type has a form of a 35-character long string (for 
encoded representation of SHA-1 hash code). Another type of tag is a trigger type. A 
trigger is the name of a Java class that will be verified to ensure sure that it exists, and 
that it is derived from the right subclass type to be a valid trigger. Another type of tag is 
a boolean type. A tag of boolean type can only be set to true or false. Other values are 
not allowed. Another type of tag is an email type. A tag of email type must include a 
properly formatted e-mail address including a user name and host name. Another type of 
tag is a password type. A tag of password type has the form of any string, but with the 
property of returning a string of asterisks (for example) rather than its exact value when 
the tag is read. Other tags types may exist as would be apparent. 
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[0164] Another metatag that is enforced on the volume manager is one that allows new 
values to be set. This metatag will not allow new values to be created for that tag. 
Another metatag records all current and past values for a particular tag. Whenever a new 
tag value is set to particular tag name, this metatag, referred to as "tag.values" is updated 
so that it includes a current list of all the values that have ever been applied to that 
particular tag. This allows users to determine, by browsing the tag volume, which of the 
values of the tags are actually being used. Tags may also include a default value so that 
when the tag is set the default is used if no other value is provided. An owner of the tag 
may also be specified. This maybe used to limit who can add, modify, delete, view, etc., 
certain tags. 

[0165] Tags may be assigned to a tag group, for example, by setting the "tag. group" 
metatag. Tags that have the same value for the "tag.group" metatag are considered to 
belong to the same tag group. When a single tag that belongs to a particular tag group is 
applied to a file, all of the other tags in that same tag group are also applied to that file. 
Similarly, when a tag belonging to a particular tag group is deleted from a file, all of the 
other tags in that tag group are also deleted. Tags in tag groups are intended to be applied 
and removed together. In some embodiments, if one tag in a tag group is changed and if 
any tag in the tag group has a trigger associated with it, the trigger will fire (whereas 
normally only the trigger associated with the tag that is changed would be fired). 

[0166] In some embodiments of the invention, a metatag of type trigger may be assigned 
to a tag in the tag folder hierarchy. As described above, this corresponds to a Java class 
that gets invoked at various points in the operation of file management system 500. For 
example, triggers may be attached to file operation including opening, closing, reading, 
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and/or writing of a file. Triggers may also be attached to metadata operations including 
changing a tag or changing an attribute. In addition, periodic triggers may be invoked as 
would be apparent, without touching the system in any other way. Triggers may perform 
any number of operations including sending an e-mail, setting various tags, performing 
file operations, writing out to a log file, creating a new file based on some event, 
adjusting and/or modifying file attributes, freezing a file, etc., or any other operation that 
could be programmed using for example, Java code. 

[0167] An example of a trigger is now described. One type of trigger contemplated by 
the invention is referred to as an approval trigger. The approval trigger is set up to fire 
whenever any approval-related tag changes. The approval trigger sets several approval 
status tags to indicate who has approved a file and who has not, including the various 
icon designations. And these tags are then later interpreted by the user interface. This is 
all done based on a list of required approvers that is also attached to the file. The 
approval trigger may also send an e-mail if so designated by a tag attached to the file or 
metatag that attached to one of the tags. The approval tag may also freeze the file if all of 
the approvers have approved the file if that is designated. 

[0168] File management system 500 manages a set of approval-based triggers. In some 
embodiments, this set of triggers is managed on a user-by-user basis, so these tags may 
all include the security authentication domain and user name of the user who approved 
the file. For example, one tag associated with the approval might correspond to a date tag 
with the name "sys.signature.domain.user.date." According to the invention, these tags 
are applied through a signature XML or RMI call rather than directly by the user. This 
ensures that a formal approval process is followed, that certain requirements have been 
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met, that the users have been authenticated, etc. 

[0169] One embodiment of the invention implements four approval-based tags. These 
include a date tag, a hash code tag associated with the file, a status of the approval (for 
example, "signed" or "rejected"), and the approver's comments relating to their approval 
or rejection. 

[0170] In addition to the approval-based tags, this embodiment may also include a set of 
tags used to control whether other tags (such as the approval-based tags) are required on 
all the files that go into a folder. By setting these tags on a folder, then every time a file 
is created or moved in that folder, file management system 500 will require that the other 
tags are set; if not, the create or move operation will not be allowed. 

[01 71] Another mechanism exists in file management system 500 similar to the tag 
volume described above. This mechanism is referred to as a user volume or a user folder 
hierarchy. As with the tag volume, all users of file management system 500 are reflected 
into the file system as a directory of their corresponding user IDs. For a user "rick" 
in domain "grokker," there would be a folder in file system 530 named "/volume 
root/users/grokker/rick." As described above, any number of tags can be attached to that 
folder to in effect describe that user. For example, these tags could include a human- 
friendly user name including a first name and a last name, an e-mail address, a password, 
a preferred language, as well as authentication tokens and pointers to authentication 
servers, etc. This folder may be linked to other folders thereby designating groups or 
roles for permission and access purposes. 

[0172] File management system 500 as thus described provides a framework for 
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implementing various aspects of the invention that will now be described. The first of 
these aspects is "live copy" and "smart links " As described above, any file in file system 
530 has associated with it a slot 532, an entry 534, and a stream 538. When a live copy 
or smart link command is issued with respect to this file, the file system creates a second 
1 slot 532 that points to the existing entry 534, and thus the same stream 538. As has been 
described above, slots 532 include name information and entries 534 manage tags, and 
further, multiple slots 532 can point to a single entry 534. Thus, after the second slot is 
created, the file system, in effect, manages two names for the same underlying object. 
The live copy command also attaches a trigger to the second slot. This trigger is fired 
when the file is opened or closed, and manages the synchronization with remote systems. 

[01 73] A similar mechanism may also be used for smart caching and smart backup. A 
cache or backup trigger is attached to a file so that when the file is opened or closed, the 
trigger can access a remote cache, synchronize a local copy, or in the case of a backup, 
send the modified file off to a backup store. 

[01 74] Deferred copies are implemented using a slot and entry pair. The file system 
permits more than one slot-entry pair to point to the same underlying item 536. As 
described above, the slot manages the name (so the underlying item can have multiple 
names) and the entry manages the tags (implying that the underlying item can have 
different sets of tags). The deferred copy command creates a second slot-entry pair 
pointing to the same underlying item. The deferred copy provides extremely fast server 
side copies of an item because the underlying item (including its associated blob, in the 
case of a stream) is not copied. When the underlying item is opened for writing or 
modification, the volume manager detects the multiple entries pointing to the same item 
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and only then is a copy of the underlying item made. At that time, the second slot-entry 
pair is adjusted to point at the copy as would be apparent. 

[0175] Identical files are detected using the hash code described above. Whenever a file 
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is modified and closed, a background thread calculates a new hash code for that file. The 
new hash code is stored in a tag associated with that file. This causes, through a trigger 
mechanism, file management system 500 to compare the new hash code with the hash 
codes of other files in the system to identify identical files in the file system. According 
to one embodiment, the file system objects, namely the slot-entry pairs are rearranged to 
resemble a deferred copy, and the duplicate blob is removed from disk. Identical files are 
thus combined thereby freeing disk space. 

[0176] Frozen files are implemented by attaching a frozen attribute as a boolean field to 
an entry object associated with the file. Whenever this file is opened, this field is 
examined to determine the allowed operations. Nothing happens if the file is opened for 
reading. However, if the file is opened for writing or creating an error will be thrown and 
that operation will be prevented. In some embodiments, this field may also be examined 
when tags are set so that tags on a frozen file cannot be modified, added, deleted, etc. In 
one embodiment of the invention, a frozen file is akin to a permanent read only file, 
including its tags. In various embodiments of the invention, the only operations allowed 
on a frozen file are reading and renaming. 

[0177] Query folders are implemented through query tags attached to the folder. Query 
tags differ from other tags described above in that they can only be attached to empty 
folders. When these tags are set, special links are made to all of the files that match the 
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query. These links are updated when either the query tags change or when one of the 
files matching the query changes. 

10178/ Search folders are implemented in a similar fashion; however, instead of 
performing a search using the tag mechanism described above, the search folder utilizes a 
free-text search engine. As described above, the search engine returns the file ID based 
on a provided search string and the file ID is used to get the file name. 

[01 79] File versions are created automatically, either when a user does a file create on 
top of an existing file, or when file management system 500 detects a renaming sequence. 
For example, Microsoft Word uses a renaming sequence that renames the original file to 
a backup file and then renames a temporary file to the name of the original file. The file 
system implements and manages versions by maintaining a linked list of entries with 
various state bits that control whether or not those entries are shown in directories when 
the directories are enumerated. When the directory is enumerated, the file system uses 
these state bits to determine which versions to display based on, for example, user 
preferences. In one embodiment, older versions of files have an ISO standard date 
encoded intp their names for use and discrimination by other systems, along with the 
word "version". This encoding also avoids name collisions as would happen, for 
example, if all the versions had the same name as the original file. In some 
embodiments, automatically-created versions can also be renamed with a name chosen by 
the user. 

[0180] Copy pedigrees are also implemented by file management system 500. When 

copies are created using, for example, a server side copy command, the server tracks 
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these copy operations by having each entry object forward point to a collection of other 
entries that are copies thereof. Likewise, each entry object may also. backward point to 
the. entry from which it was copied. File management system 500 responds to 
appropriate XML and RMI commands to present these copies pedigrees in a user 
interface in an appropriate form to illustrate the migration of copies from place to place. 

[0181] Undeleting files is implemented as set forth below. As files are deleted, their 
corresponding slot objects are renamed and a field in the slot object is set to indicate that 
the slot has been deleted. When a directory is enumerated, deleted slots are not shown. 
This process is reversed when a file is undeleted. The field in the slot is unset and the 
name is changed back to its original value. In an analogous way to versions, deleted 
filenames are marked with the string "deleted" and the date that the file was deleted. 
When these files are undeleted, their names are marked with the string "undeleted" and 
the date that they were undeleted. File management system 500 responds to an 
appropriate XML or RMI command to toggle a per-user boolean value, managed in 
container 537, which in turn controls whether the deleted files are shown when the 
corresponding user enumerates the container. With this field enabled, users can see 
deleted files in the same context where they were originally located. 

[0182] Type folders are implemented with a special tag on the folder that file 
management system 500 examines prior to allowing a file to be added there. If the file 
does not match the specified type, the system will not allow the file to be placed in that 
folder. 
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